Logo Logo Cranfield
Dig Deeper - Microbial Community Analysis

1 Introduction

Biodiversity loss, ecosystem degradation, and habitat destruction are increasingly linked to human-driven changes in land use, including urbanisation, agriculture, and the exploitation of natural resources (European Parliament, 2025; Jaureguiberry et al., 2022). In response, governments across Europe — including the EU — have introduced ambitious environmental strategies such as the EU Biodiversity Strategy for 2030 (European Parliament, 2025) and the 30x30 target (Markwick, 2023), which aims to protect 30% of land and sea by the year 2030.

Ecological restoration plays a vital role in addressing these challenges. Rather than simply returning ecosystems to a previous state, modern approaches focus on restoring ecological processes and enhancing ecosystem resilience (Hicks, 2023).

1.1 The RestREco Initiative

RestREco (Restoring Resilient Ecosystems) is a NERC-funded research project that adopts a resilience-based perspective on ecological restoration. The initiative brings together researchers from:

  • Cranfield University
  • University of Stirling
  • UK Centre for Ecology & Hydrology
  • The National Trust
  • Forest Research

Using a natural experiment design, RestREco studies a network of 133 ecological restoration sites across England and Scotland. The project aims to identify key drivers of ecosystem development, such as:

  • Time since restoration began
  • Initial ecological conditions
  • Proximity to existing woodland and grassland

The goal is to understand how these factors influence ecosystem complexity, function, and resilience to future pressures (RestREco, 2024).

1.2 The Dig Deeper Study

As part of the RestREco initiative, the Dig Deeper study focused on how the age of restoration, establishment type, and site management affect soil microbial communities, specifically bacteria and fungi.

To explore this, high-throughput sequencing was conducted on:

  • 16S rRNA gene (for bacterial communities)
  • ITS region (for fungal communities)

The analysis focused on three main aspects:

  • Alpha and beta diversity
  • Taxonomic composition
  • Functional diversity

These microbial assessments complement broader ecosystem-level measurements within the RestREco project, including vegetation, invertebrates, and ecosystem functions such as litter decomposition, pollination services, and soil thermodynamic efficiency.

The following sections describe the sampling design, metadata structure, and the processing pipeline used to characterise microbial communities.


2 Sample Design and Metadata Overview

2.1 Sample Collection and Geographic Coverage

A total of 330 soil samples were collected in 66 sites of England for each marker (5 per site).

Figure 2. Sample Zone - Based on GPS coordinates

Overview of the soil sampling and sequencing strategy for each microbial marker.
Metric 16S ITS
Microbial group Bacteria Fungi
Region sampled England England
Number of sites 66 66
Samples per site 5 5
Total samples 330 330
Average reads per sample ~65,000 ~65,000
Read count range 30,000–85,000 10,000–90,000

Figure 3. Sampling Summary



2.2 Metadata Overview

Each sample collected was accompanied by metadata capturing key environmental and management variables. These contextual factors were essential for interpreting variation in microbial diversity.

Description of metadata variables associated with soil samples
Variable Description
Site Name of the sampling site
Plot number Subdivision of each site (usually 5 plots per site)
CU Code Unique code for each sample
Year_est Year of establishment of the site
Age Site age (ranging from 1 year to over 100 years )
Latitude/Longitude GPS coordinates of the sample
Establishment Restoration type or land management
pH Soil pH value at the time of sampling
EC Electrical conductivity of the soil
Cutting Whether the site is cut (1 = Yes, 0 = No)
Cattle Presence of cattle grazing (1 = Yes, 0 = No)
Sheep Presence of sheep grazing (1 = Yes, 0 = No)
Plough Whether the soil has been ploughed (1 = Yes, 0 = No)

Figure xx. Metadata Summary

3 Analysis Pipeline

3.1 Input Files Selection

Before proceeding with bioinformatic analyses, it was necessary to select consistent input files for each sample. For both ITS and 16S datasets, each sample was associated with multiple types of sequencing files. For example, a sample such as GF677 had up to five different files: GF677_1.fastq.gz, GF677_2.fastq.gz (paired-end reads with barcode and primer removed), GF677.raw_1.fastq.gz, GF677.raw_2.fastq.gz (raw unprocessed reads), and GF677.extendedFrags.fastq.gz (merged forward and reverse reads). To ensure consistency and avoid redundancy, we selected one file type per sample for downstream analysis. For the 16S dataset, we used the paired-end reads with barcode and primer removed (*_1.fastq.gz and *_2.fastq.gz), while for the ITS dataset, we selected the raw reads (*.raw_1.fastq.gz and *.raw_2.fastq.gz), which were best suited for our quality filtering and denoising steps.

3.2 Denoising and Feature Table Construction

The sequencing data were processed using the QIIME 2 bioinformatics platform — a widely used tool for microbiome analysis. Raw amplicon reads were denoised using the DADA2 plugin, enabling accurate identification of amplicon sequence variants (ASVs) with single-nucleotide resolution. This step also included chimera removal and read quality trimming, ensuring high-confidence input for downstream analyses.

Following denoising, a feature table was constructed for each dataset (16S and ITS), summarising the number of sequences associated with each ASV across samples.

3.3 Diversity Analysis and Taxonomic Assignment

Following quality control and feature extraction, downstream analyses were performed to characterise microbial diversity and taxonomic composition.

Alpha and beta diversity metrics were computed using the diversity and emperor plugins in QIIME 2, enabling comparison of microbial communities across restoration gradients and site conditions.

Taxonomic classification of ASVs was then performed using trained classifiers against reference databases: GreenGene2 for bacterial 16S sequences and UNITE for fungal ITS sequences. This allowed each ASV to be annotated with its likely taxonomic lineage (from kingdom down to genus or species when possible).

TO BE COMPLETED/CHANGED

3.4 Summary of Pipeline

16S Pipeline thumbnail

Figure xx. Workflow (16S)


4 QC and Pre-Processing

4.1 MutliQC on raw data

Bacteria (16S)

You can explore the full MultiQC report by clicking the image below:

MutlitQc thumbnail

Figure xx. MultiQC Plot (16S)

Fungi (ITS)

You can explore the full MultiQC report by clicking the image below:

MutlitQc thumbnail

Figure xx. MultiQC Plot (ITS)

4.2 QC after denoising

4.2.1 Statistics Table Summary

Bacteria (16S)

Here is a link to the statistics after denoising to view it on QIIME2 (16S) : Statitics after denoising (16S)

Figure xx. Statitics after denoising (16S)

Fungi (ITS)

Here is a link to the statistics after denoising to view it on QIIME2 (ITS) : Statitics after denoising (16S)

Figure xx. Statitics after denoising (ITS)

4.2.2 QC Plots

Bacteria (16S)
Fungi (ITS)
QC plot thumbnail

Figure xx . QC plot (ITS)


5 Alpha Diversity

Alpha diversity refers to the variety of organisms within a particular sample or environment. It reflects both richness—the number of distinct taxa—and evenness—how evenly individuals are distributed among those taxa. One of the most widely used measures for assessing alpha diversity is the Shannon index.

The Shannon index takes into account not only the number of species present, but also how evenly their abundances are distributed. A higher Shannon value generally indicates a more diverse and ecologically balanced community.

Another important metric is Faith’s Phylogenetic Diversity (Faith PD), which measures the total branch length of the phylogenetic tree that spans the species in a sample. Unlike the Shannon index, Faith PD incorporates evolutionary relationships, providing a phylogenetic perspective on diversity.

We also include Pielou’s Evenness index, which specifically quantifies how equally individual organisms are distributed across taxa. While Shannon integrates both richness and evenness, this metric isolates the evenness component, providing a complementary view of diversity patterns.

In the interactive plots below, we examine how the Shannon index, Faith PD and Evenness vary across different environmental and experimental conditions, separately for the 16S (bacteria and archaea) and ITS (fungi) datasets.

To allow interactive exploration of alpha diversity metrics across different environmental variables, we implemented a drop-down menu that dynamically displays the corresponding plots. Some variables, such as pH category, are only present in the ITS dataset, while others, like Year group, are specific to the 16S dataset. Internally, variables are mapped to their dataset-specific equivalents where needed (e.g. Age group in 16S becomes Age category in ITS). It is important to note, however, that these variables are not always directly comparable: for instance, Age group (16S) divides sites into multiple discrete intervals based on restoration age, while Age category (ITS) is a binary classification based on whether a site is above or below the median age. Despite these differences, the interface ensures that only available and relevant plots are shown for each selection.

For the ITS dataset, the alpha diversity was done per sample. In order to align it with the 16S analysis—where samples were already grouped by site—we aggregated the alpha diversity values by computing the mean per site. Categorical metadata was simplified using the most common (modal) value per site. This ensures consistency across datasets in the visual outputs. However, users interested in the original, unaggregated sample-level data can explore the full QIIME 2 results via the links provided under each section.

5.1 Shannon Index Boxplot

The boxplot below illustrate differences in Shannon diversity across groups. This metric reflects both species richness and how balanced the community is in terms of species abundance.

Bacteria (16S)

Kruskal-Wallis p-value: 0.000828

Here is a link to the full QIIME2 results (16S) : Shannon Index (16S)

Fungi (ITS)

Kruskal-Wallis p-value: 0.796

Here is a link to the full QIIME2 results (ITS) : Shannon Index (ITS)

5.2 Faith PD Boxplot

The following plots show Faith’s Phylogenetic Diversity, which integrates evolutionary relationships to capture how phylogenetically broad each microbial community is.

Kruskal-Wallis p-value: 0.0194

Here is a link to the full QIIME2 results (16S) : Faith PD (16S)

5.3 Evenness Boxplot

These boxplots display Pielou’s Evenness, highlighting how uniformly taxa are represented in each community. It allows us to isolate imbalance in dominance from richness effects.

Bacteria (16S)

Kruskal-Wallis p-value: 0.00576

Here is a link to the full QIIME2 results (16S) : Pielou Evenness (16S)

Fungi (ITS)

Kruskal-Wallis p-value: 0.512

Here is a link to the full QIIME2 results (ITS) : Pielou Evenness (ITS)

5.4 Spearman test (Bacteria - 16S)

Summary Table

Correlation Plots



6 Comparative Microbial Community Composition (Beta Diversity)

To explore differences in microbial communities, we often rely on dimensionality reduction techniques such as Principal Coordinates Analysis (PCoA), visualised through Emperor plots. Two commonly used distance metrics in this context are Bray-Curtis and Jaccard.

While both metrics can reveal meaningful clustering and separation in microbial data, they capture complementary aspects of community structure.

6.1 Emperor Plots

6.1.1 Bray-Curtis Emperor Plot

The Bray-Curtis Emperor plot is a 3D visualisation of microbial community dissimilarities between samples, based on the Bray-Curtis distance. This distance metric quantifies how different two samples are in terms of species abundance, taking into account both presence/absence and relative abundances. It does not incorporate evolutionary relationships between features.

Using Principal Coordinates Analysis (PCoA), the high-dimensional Bray-Curtis distance matrix is projected into a lower-dimensional space—typically three axes—to capture the main patterns of variation across samples.

The Emperor plot is an interactive 3D tool developed for QIIME 2 that allows users to explore these PCoA results. Samples are represented as points in space, and their spatial proximity reflects ecological similarity:

  • Samples that are closer together have more similar microbial communities.
  • Samples that are further apart differ more strongly in community composition.

This type of plot is particularly useful for identifying clustering by experimental groups—such as treatment, site, or timepoint—and for detecting patterns or gradients in microbial diversity.

Bacteria (16S)

Figure 17. Bray-Curtis Emperor Plot

Here is a link to the Bray-Curtis Emperor Plot for more flexibility on QIIME2: Bray-Curtis Emperor Plot (16S)

Fungi (ITS)

Figure 17. Bray-Curtis Emperor Plot

Here is a link to the Bray-Curtis Emperor Plot for more flexibility on QIIME2: Bray-Curtis Emperor Plot (ITS)

6.1.2 Jaccard Emperor Plot

The Jaccard Emperor plot provides a 3D visualisation of microbial community dissimilarities based on the Jaccard distance. Unlike Bray-Curtis, the Jaccard metric considers only the presence or absence of features (e.g., microbial taxa) in each sample, ignoring their relative abundances.

This makes the Jaccard distance particularly suited for assessing community membership rather than abundance structure—focusing on which species are present, regardless of how abundant they are.

Using Principal Coordinates Analysis (PCoA), the high-dimensional Jaccard distance matrix is projected into a lower-dimensional space—usually three principal axes—to reveal major patterns in sample composition.

As with Bray-Curtis, the Emperor plot allows for interactive exploration of these ordinations:

  • Samples positioned closely together share more taxa in common (i.e., similar membership).
  • Samples far apart have fewer shared taxa, reflecting greater differences in species presence.

The Jaccard plot is useful when exploring factors that influence community membership, such as habitat type, land use, or environmental filtering—especially in studies where presence/absence patterns are more meaningful than relative abundances.

Bacteria (16S)

Figure 18. Jaccard Emperor Plot

Here is a link to the Jaccard Emperor Plot for more flexibility on QIIME2: Jaccard Emperor Plot (16S)

Fungi (ITS)

Figure 18. Jaccard Emperor Plot

Here is a link to the Jaccard Emperor Plot for more flexibility on QIIME2: Jaccard Emperor Plot (ITS)


7 Taxonomy Composition

7.1 Taxonomy Barplot

Bacteria (16S)

Figure 18. Taxonomy Barplot (16S)


Here is a link to the Taxonomy Barplots for more flexibility on QIIME2: Taxonomy Barplot (16S)

Fungi (ITS)

Figure 18. Taxonomy Barplot (ITS)


Here is a link to the Taxonomy Barplots for more flexibility on QIIME2: Taxonomy Barplot (ITS)

7.2 Krona Plots

To explore the composition of soil microbial communities, we used Krona plots — interactive, circular charts that display taxonomic abundances in a hierarchical manner.

These plots allow users to intuitively navigate from broader taxonomic levels (such as Phylum) to more specific ones (like Genus), while simultaneously comparing relative abundances across taxa.

In this study, Krona plots provide a powerful and user-friendly way to:

  • Visualise which microbial groups dominate each site
  • Explore the taxonomic diversity present in bacterial, archaeal, and fungal communities

You can click on the images below to access the Krona plots for each site.

Bacteria (16S)

Krona thumbnail

Figure 19. Krona Plot for Baltic_farm_1 (16S)

Fungi (ITS)

Krona thumbnail

Figure 19. Krona Plot for Baltic_farm_1 (ITS)

7.3 Differential Abundance Analysis with ANCOM

We used ANCOM to identify taxa whose relative abundances significantly differed across groups. This method accounts for the compositional nature of microbiome data by comparing log-ratios between taxa. The results are shown as volcano-like plots, where the W statistic reflects how many pairwise comparisons a taxon was found to differ in. Significant taxa are highlighted accordingly.

Bacteria (16S)

Figure 18. Genus-Level Differences in Taxa Abundance by Establishment Type (ANCOM Volcano Plot) (16S)


Here is a link to the Volcano Plots for more flexibility on QIIME2: Volcano Plot (16S)

Fungi (ITS)

Figure 18. Differences in Taxa Abundance by Establishment Type (ANCOM Volcano Plot) (ITS)


Here is a link to the Volcano Plots for more flexibility on QIIME2: Volcano Plot (ITS)


8 Functional Diversity

8.1 Bacteria (16S)

8.1.1 Differentially Abundant Pathways

Figure 18. Functional Pathway Differences by Establishment (ANCOM Results) (16S)


Here is a link to the Volcano Plots for more flexibility on QIIME2: Volcano Plot (16S)

8.2 Fungi (ITS)

In microbial ecology, a guild refers to a group of organisms that fulfil similar ecological roles, regardless of their taxonomic identity. Understanding functional guilds allows researchers to move beyond taxonomic profiles and assess the ecological roles that microbial communities may play in an environment.

To investigate the ecological roles of fungal communities, we used FUNGuild, a tool that assigns fungi to functional guilds based on curated databases and literature. These guilds represent ecological strategies such as:

  • Saprotrophs: decomposers of organic matter
  • Mycorrhizal fungi: symbionts associated with plant roots
  • Pathogens: organisms that cause disease in plants or animals
  • Ectomycorrhizal: symbionts that assist plant roots by enhancing nutrient and water uptake, especially nitrogen and phosphorus, while contributing to carbon cycling and soil nutrient mobilisation
  • Arbuscular Mycorrhizal: symbionts that contribute to nutrient uptake
  • Endophyte: microorganisms that promote the growth and development of the plants
  • Lichenized: fungi that form symbiotic associations with algae or cyanobacteria, creating lichens that can survive in harsh environments by combining structural support and photosynthetic ability
  • Parasites: fungi that live on or inside a host organism, extracting nutrients and often causing harm
  • Symbiotrophs: fungi that engage in mutually beneficial relationships with host organisms, typically exchanging nutrients for resources like carbon or protection

This functional classification provides valuable insights into what fungi are likely doing in the ecosystem, beyond simply who they are.

This section explores the functional roles of fungi within each site, based on guild-level annotations provided by FUNGuild. Fungal guilds reflect ecological functions such as saprotrophy, symbiosis (e.g., mycorrhizal fungi), or pathogenicity. This approach provides insight into how fungal communities may contribute to ecosystem processes, complementing traditional taxonomic analyses.

8.2.1 Top 20 Most Abundant Fungal Guilds

The plot below highlights the top 20 most abundant fungal guilds identified using FUNGuild. To avoid clutter, the guild names are hidden on the y-axis; however, users can hover over each bar to reveal the full name, enabling interactive and detailed exploration of fungal functional diversity.

Figure 20. Top 20 functional guilds

8.2.2 Fungal Guild Abundance Across Sites

The figure below shows the total abundance of fungal ASVs across sites, aggregated by functional guild. This provides an overview of how guild-level composition varies between locations, which may reflect differences in land use, soil conditions, or restoration histories.

9 Significance Analysis